Supplemental

A Comparison to Latent Growth Curve Models

It is common in Structural Equation Modeling (SEM) to deal with longitudinal data via a Latent Growth Curve (LGC) model. It turns out that LGC are in a sense, just a different form of the very commonly used mixed model framework. In some ways they are more flexible, mostly in the standard structural equation modeling framework that allows for indirect, and other complex covariate relationships. In other ways, they are less flexible, e.g. requiring balanced data, estimating nonlinear relationships, data with many time points, dealing with time-varying covariates. With appropriate tools there is little one can’t do with the normal mixed model approach relative to the SEM approach, and one would likely have easier interpretation. As such I’d recommend sticking with the standard mixed model framework unless you really need to.

To best understand a growth curve model, I still think it’s instructive to see it from the mixed model perspective, where things are mostly interpretable from what you know from a standard linear model. We will use our GPA example from before, and one can refer to the [appendix][Appendix] for more detail.

Random Effects as Latent Variables

As before we assume the following for the GPA model. As a simple starting point we merely model a trend of time (occasion- 6 semesters) and have random effects due to student for both intercept and occasion. In this setting we are treating time as numeric, but one could treat the occasion variable as categorical.

\[\mathcal{GPA} = (b_{\mathrm{intercept}} + \mathrm{re}_{\mathrm{intercept}}) + (b_{\mathrm{occ}} + \mathrm{re}_{\mathrm{occasion}})\cdot \mathrm{occasion} + \epsilon\]

\[\mathrm{re}_{\mathrm{intercept}} \sim \mathcal{N}(0, \tau)\] \[\mathrm{re}_{\mathrm{occasion}} \sim \mathcal{N}(0, \varphi)\] \[\epsilon \sim \mathcal{N}(0, \sigma)\]

Thus the student effects for the intercept and slope are random, and specifically are normally distributed with mean of zero and some estimated standard deviation (\(\tau\), \(\varphi\) respectively)1. We consider these effects as coming from unspecified, or latent, causes due to student. In addition, we have the usual residual error \(\epsilon\).

The corresponding model may be run using lme4 as follows.

load('data/gpa.RData')   

# if you haven't downloaded the workshop RStudio project
# load(url('https://github.com/m-clark/mixed-models-with-R/raw/master/data/gpa.RData?raw=true'))

library(lme4)
mixed_init = lmer(gpa ~ occasion + (1 + occasion|student), data = gpa)
# summary(mixed_init)

With regard to the output, the fixed (population-average) effects are the \(b_{\mathrm{intercept}}\) and \(b_{\mathrm{occ}}\) in the previous model depiction. The standard deviations of the random effects are the \(\tau\), \(\varphi\) and \(\epsilon\).

effect group term estimate std.error statistic
fixed (Intercept) 2.60 0.02 141.59
fixed occasion 0.11 0.01 18.07
ran_pars student sd__(Intercept) 0.21
ran_pars student sd__occasion 0.07
ran_pars student cor__(Intercept).occasion -0.10
ran_pars Residual sd__Observation 0.21

Random Effects in SEM

In SEM, we specify the latent linear model as follows.

\[Y = b_{\mathrm{intercept}} + \lambda F + \epsilon\] \[F \sim \mathcal{N}(0, \tau)\]

In the above, \(Y\) is our observed variable, \(b_{intercept}\) is the intercept as in a standard linear regression model, \(\lambda\) is the coefficient (loading in factor analysis/SEM terminology) for the latent variable, represented as \(F\). The latent variable is assumed normally distributed, with zero mean, and some estimated variance, just like the random effects in mixed models.

Note that if \(\lambda = 1\), we then have the right hand side as \(b_{intercept} + F\), and this is indistinguishable from the random intercept portion of the mixed model (\(b_{\mathrm{intercept}} + \mathrm{re}_{\mathrm{intercept}}\)). Through this that we can maybe start to get a sense of random effects as latent variables (or vice versa). Indeed, mixed models have ties to many other kinds of models (e.g. spatial, additive), because those models also add a ‘random’ component to the model in some fashion.

Running a Growth Curve Model

The graphical model for the standard LGC model resembles that of confirmatory factor analysis (CFA) with two latent variables/factors. The observed, or manifest, measures are the dependent variable values at each respective time point. However, for those familiar with structural equation modeling (SEM), growth curve models will actually look a bit different compared with typical SEM, because we have to fix the factor loadings to specific values in order to make it work for the LGC. As we will see, this also leads to non-standard output relative to other SEM models, as there is nothing to estimate for the many fixed parameters.

More specifically, we’ll have a latent variable representing the random intercepts, as well as one representing the random slopes for the longitudinal trend (time), which in the GPA data is the semester indicator. All loadings for the intercept factor are 1. The loadings for the effect of time are arbitrary, but should accurately reflect the time spacing, and typically it is good to start at zero, so that the zero has a meaningful interpretation.

Wide Data

As might be guessed from the above visualization, for the LGC our data needs to be in wide format, where each row represents a person and we have separate columns for each time point of the target variable, as well as any other variable that varies over time. This is contrasted with the long format we use for the mixed model, where rows represent observations at a given time point. We can use the spread function from tidyr to help with that. We end up with a data frame of two-hundred observations and columns for each semester gpa (0 through 5 for six semesters) denoted by gpa_*.

gpa_wide = gpa %>% 
  select(student, sex, highgpa, occasion, gpa) %>% 
  spread(key = occasion, value = gpa) %>% 
  rename_at(vars(`0`,`1`,`2`,`3`,`4`,`5`), function(x) glue::glue('gpa_{x}'))

We’ll use lavaan for our excursion into LGC. The syntax will require its own modeling code, but lavaan tries to keep to R regression model style. The names of intercept and slope are arbitrary. The =~ is just denoting that the left-hand side is the latent variable, and the right-hand side are the observed/manifest variables.

lgc_init_model = '

  intercept =~ 1*gpa_0 + 1*gpa_1 + 1*gpa_2 + 1*gpa_3 + 1*gpa_4 + 1*gpa_5
  slope     =~ 0*gpa_0 + 1*gpa_1 + 2*gpa_2 + 3*gpa_3 + 4*gpa_4 + 5*gpa_5
  
'

Now we’re ready to run the model. Note that lavaan has a specific function, growth, to use for these models. It doesn’t spare us any effort for the model syntax, but does make it unnecessary to set various arguments for the more generic sem and lavaan functions.

library(lavaan)
lgc_init = growth(lgc_init_model, data = gpa_wide)
summary(lgc_init)
lavaan 0.6-3 ended normally after 73 iterations

  Optimization method                           NLMINB
  Number of free parameters                         11

  Number of observations                           200

  Estimator                                         ML
  Model Fit Test Statistic                      43.945
  Degrees of freedom                                16
  P-value (Chi-square)                           0.000

Parameter Estimates:

  Information                                 Expected
  Information saturated (h1) model          Structured
  Standard Errors                             Standard

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)
  intercept =~                                        
    gpa_0             1.000                           
    gpa_1             1.000                           
    gpa_2             1.000                           
    gpa_3             1.000                           
    gpa_4             1.000                           
    gpa_5             1.000                           
  slope =~                                            
    gpa_0             0.000                           
    gpa_1             1.000                           
    gpa_2             2.000                           
    gpa_3             3.000                           
    gpa_4             4.000                           
    gpa_5             5.000                           

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)
  intercept ~~                                        
    slope             0.002    0.002    1.629    0.103

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)
   .gpa_0             0.000                           
   .gpa_1             0.000                           
   .gpa_2             0.000                           
   .gpa_3             0.000                           
   .gpa_4             0.000                           
   .gpa_5             0.000                           
    intercept         2.598    0.018  141.956    0.000
    slope             0.106    0.005   20.338    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .gpa_0             0.080    0.010    8.136    0.000
   .gpa_1             0.071    0.008    8.799    0.000
   .gpa_2             0.054    0.006    9.039    0.000
   .gpa_3             0.029    0.003    8.523    0.000
   .gpa_4             0.015    0.002    5.986    0.000
   .gpa_5             0.016    0.003    4.617    0.000
    intercept         0.035    0.007    4.947    0.000
    slope             0.003    0.001    5.645    0.000

Most of the output is blank, which is needless clutter, but we do get the same five parameter values we are interested in though.

Start with the ‘intercepts’:

Intercepts:
                   Estimate  Std.Err  Z-value  P(>|z|)

    intercept         2.598    0.018  141.956    0.000
    slope             0.106    0.005   20.338    0.000

It might be odd to call your fixed effects ‘intercepts’, but it makes sense if we are thinking of it as a multilevel model as depicted previously, where we actually broke out the random effects as a separate model. The estimates here are pretty much spot on with our mixed model estimates.

library(lme4)
gpa_mixed = lmer(gpa ~ occasion + (1 + occasion | student), data=gpa)
summary(gpa_mixed)
Linear mixed model fit by REML ['lmerMod']
Formula: gpa ~ occasion + (1 + occasion | student)
   Data: gpa

REML criterion at convergence: 261

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.2695 -0.5377 -0.0128  0.5326  3.1939 

Random effects:
 Groups   Name        Variance Std.Dev. Corr 
 student  (Intercept) 0.045193 0.21259       
          occasion    0.004504 0.06711  -0.10
 Residual             0.042388 0.20588       
Number of obs: 1200, groups:  student, 200

Fixed effects:
            Estimate Std. Error t value
(Intercept) 2.599214   0.018357  141.59
occasion    0.106314   0.005885   18.07

Correlation of Fixed Effects:
         (Intr)
occasion -0.345
# fixef(gpa_mixed)

Now let’s look at the variance estimates. The estimation of residual variance for each time point in the LGC distinguishes the two approaches, but not necessarily so. We could fix them to be identical here, or conversely allow them to be estimated in the mixed model framework. Just know that’s why the results are not identical (to go along with their respective estimation approaches, which are also different by default).

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)
  intercept ~~                                        
    slope             0.002    0.002    1.629    0.103
    
Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .gpa_0        0.080    0.010    8.136    0.000
   .gpa_1        0.071    0.008    8.799    0.000
   .gpa_2        0.054    0.006    9.039    0.000
   .gpa_3        0.029    0.003    8.523    0.000
   .gpa_4        0.015    0.002    5.986    0.000
   .gpa_5        0.016    0.003    4.617    0.000
    intercept         0.035    0.007    4.947    0.000
    slope             0.003    0.001    5.645    0.000
VarCorr(gpa_mixed)
 Groups   Name        Std.Dev. Corr  
 student  (Intercept) 0.212587       
          occasion    0.067111 -0.098
 Residual             0.205883       

The differences provide some insight. LGC by default assumes heterogeneous variance for each time point. Mixed models by default assume the same variance for each time point, but can allow them to be estimated separately in most modeling packages.

As an example, if we fix the variances to be equal, the models are now identical.

model = "
  intercept =~ 1*gpa_0 + 1*gpa_1 + 1*gpa_2 + 1*gpa_3 + 1*gpa_4 + 1*gpa_5
  slope =~ 0*gpa_0 + 1*gpa_1 + 2*gpa_2 + 3*gpa_3 + 4*gpa_4 + 5*gpa_5
  
  gpa_0 ~~ residual*gpa_0
  gpa_1 ~~ residual*gpa_1
  gpa_2 ~~ residual*gpa_2
  gpa_3 ~~ residual*gpa_3
  gpa_4 ~~ residual*gpa_4
  gpa_5 ~~ residual*gpa_5
"

growthCurveModel = growth(model, data=gpa_wide)
summary(growthCurveModel)
lavaan 0.6-3 ended normally after 51 iterations

  Optimization method                           NLMINB
  Number of free parameters                         11
  Number of equality constraints                     5

  Number of observations                           200

  Estimator                                         ML
  Model Fit Test Statistic                     191.409
  Degrees of freedom                                21
  P-value (Chi-square)                           0.000

Parameter Estimates:

  Information                                 Expected
  Information saturated (h1) model          Structured
  Standard Errors                             Standard

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)
  intercept =~                                        
    gpa_0             1.000                           
    gpa_1             1.000                           
    gpa_2             1.000                           
    gpa_3             1.000                           
    gpa_4             1.000                           
    gpa_5             1.000                           
  slope =~                                            
    gpa_0             0.000                           
    gpa_1             1.000                           
    gpa_2             2.000                           
    gpa_3             3.000                           
    gpa_4             4.000                           
    gpa_5             5.000                           

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)
  intercept ~~                                        
    slope            -0.001    0.002   -0.834    0.404

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)
   .gpa_0             0.000                           
   .gpa_1             0.000                           
   .gpa_2             0.000                           
   .gpa_3             0.000                           
   .gpa_4             0.000                           
   .gpa_5             0.000                           
    intercept         2.599    0.018  141.947    0.000
    slope             0.106    0.006   18.111    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .gpa_0   (rsdl)    0.042    0.002   20.000    0.000
   .gpa_1   (rsdl)    0.042    0.002   20.000    0.000
   .gpa_2   (rsdl)    0.042    0.002   20.000    0.000
   .gpa_3   (rsdl)    0.042    0.002   20.000    0.000
   .gpa_4   (rsdl)    0.042    0.002   20.000    0.000
   .gpa_5   (rsdl)    0.042    0.002   20.000    0.000
    intrcpt           0.045    0.007    6.599    0.000
    slope             0.004    0.001    6.387    0.000

Compare to the lme4 output.

 Groups   Name        Variance  Corr  
 student  (Intercept) 0.0451934       
          occasion    0.0045039 -0.098
 Residual             0.0423879       

Random Intercepts with Homogeneous Variance

How can we put these models on the same footing? Let’s take a step back and do a model with only random intercepts. In this case, time is an observed measure, and has no person-specific variability. Our graphical model now looks like the following.

We can do this by fixing the slope ‘factor’ to have zero variance. However, note also that in the LGC, at each time point of the gpa outcome, we have a unique (residual) variance associated with it. Conversely, this is constant in the mixed model setting, i.e. we only have one estimate for the residual variance that does not vary by occasion. We deal with this in the LGC by giving the parameter a name and then applying it to each time point.

lgc_ran_int_model = '

 intercept =~ 1*gpa_0 + 1*gpa_1 + 1*gpa_2 + 1*gpa_3 + 1*gpa_4 + 1*gpa_5
 slope =~ 0*gpa_0 + 1*gpa_1 + 2*gpa_2 + 3*gpa_3 + 4*gpa_4 + 5*gpa_5
 
 slope ~~ 0*slope                  # slope variance is zero
 intercept ~~ 0*slope              # no covariance with intercept factor
 
 
 gpa_0 ~~ resid*gpa_0    # same residual variance for each time point
 gpa_1 ~~ resid*gpa_1
 gpa_2 ~~ resid*gpa_2
 gpa_3 ~~ resid*gpa_3
 gpa_4 ~~ resid*gpa_4
 gpa_5 ~~ resid*gpa_5
 
'

Now each time point will have one variance estimate. Let’s run the LGC.

lgc_ran_int = growth(lgc_ran_int_model, data = gpa_wide)
summary(lgc_ran_int, nd=4)  # increase the number of digits shown
lavaan 0.6-3 ended normally after 36 iterations

  Optimization method                           NLMINB
  Number of free parameters                          9
  Number of equality constraints                     5

  Number of observations                           200

  Estimator                                         ML
  Model Fit Test Statistic                     338.824
  Degrees of freedom                                23
  P-value (Chi-square)                           0.000

Parameter Estimates:

  Information                                 Expected
  Information saturated (h1) model          Structured
  Standard Errors                             Standard

Latent Variables:
                    Estimate   Std.Err   z-value   P(>|z|)
  intercept =~                                            
    gpa_0             1.0000                              
    gpa_1             1.0000                              
    gpa_2             1.0000                              
    gpa_3             1.0000                              
    gpa_4             1.0000                              
    gpa_5             1.0000                              
  slope =~                                                
    gpa_0             0.0000                              
    gpa_1             1.0000                              
    gpa_2             2.0000                              
    gpa_3             3.0000                              
    gpa_4             4.0000                              
    gpa_5             5.0000                              

Covariances:
                    Estimate   Std.Err   z-value   P(>|z|)
  intercept ~~                                            
    slope             0.0000                              

Intercepts:
                    Estimate   Std.Err   z-value   P(>|z|)
   .gpa_0             0.0000                              
   .gpa_1             0.0000                              
   .gpa_2             0.0000                              
   .gpa_3             0.0000                              
   .gpa_4             0.0000                              
   .gpa_5             0.0000                              
    intercept         2.5992    0.0217  120.0471    0.0000
    slope             0.1063    0.0041   26.1094    0.0000

Variances:
                    Estimate   Std.Err   z-value   P(>|z|)
    slope             0.0000                              
   .gpa_0   (resd)    0.0580    0.0026   22.3607    0.0000
   .gpa_1   (resd)    0.0580    0.0026   22.3607    0.0000
   .gpa_2   (resd)    0.0580    0.0026   22.3607    0.0000
   .gpa_3   (resd)    0.0580    0.0026   22.3607    0.0000
   .gpa_4   (resd)    0.0580    0.0026   22.3607    0.0000
   .gpa_5   (resd)    0.0580    0.0026   22.3607    0.0000
    intrcpt           0.0634    0.0073    8.6605    0.0000

Compare it to the corresponding mixed model.

mixed_ran_int = lmer(gpa ~ occasion + (1|student), data=gpa)
## summary(mixed_ran_int)
effect group term estimate std.error statistic
fixed (Intercept) 2.599 0.022 119.800
fixed occasion 0.106 0.004 26.096
ran_pars student sd__(Intercept) 0.252
ran_pars Residual sd__Observation 0.241

Now we have essentially identical results to mixed_ran_int. The default estimation process is different for the two, resulting in some differences starting several decimal places out, but these are not meaningful differences. We can actually use the same estimator, but the results will still differ slightly due to the data differences.

Random Intercepts and Slopes with Homogeneous Variance

Now let’s let the slope for occasion vary. We can just delete or comment out the syntax related to the (co-) variance. By default slopes and intercepts are allowed to correlate as in the mixed model. We will continue to keep the variance constant.

lgc_ran_int_ran_slope_model = '

 intercept =~ 1*gpa_0 + 1*gpa_1 + 1*gpa_2 + 1*gpa_3 + 1*gpa_4 + 1*gpa_5
 slope =~ 0*gpa_0 + 1*gpa_1 + 2*gpa_2 + 3*gpa_3 + 4*gpa_4 + 5*gpa_5
 
 # slope ~~ 0*slope                  # slope variance is zero
 # intercept ~~ 0*slope              # no covariance
 
 
 gpa_0 ~~ resid*gpa_0    # same residual variance for each time point
 gpa_1 ~~ resid*gpa_1
 gpa_2 ~~ resid*gpa_2
 gpa_3 ~~ resid*gpa_3
 gpa_4 ~~ resid*gpa_4
 gpa_5 ~~ resid*gpa_5
'
lgc_ran_int_ran_slope = growth(lgc_ran_int_ran_slope_model, data = gpa_wide)
summary(lgc_ran_int_ran_slope, nd=4)  # increase the number of digits shown
lavaan 0.6-3 ended normally after 51 iterations

  Optimization method                           NLMINB
  Number of free parameters                         11
  Number of equality constraints                     5

  Number of observations                           200

  Estimator                                         ML
  Model Fit Test Statistic                     191.409
  Degrees of freedom                                21
  P-value (Chi-square)                           0.000

Parameter Estimates:

  Information                                 Expected
  Information saturated (h1) model          Structured
  Standard Errors                             Standard

Latent Variables:
                    Estimate   Std.Err   z-value   P(>|z|)
  intercept =~                                            
    gpa_0             1.0000                              
    gpa_1             1.0000                              
    gpa_2             1.0000                              
    gpa_3             1.0000                              
    gpa_4             1.0000                              
    gpa_5             1.0000                              
  slope =~                                                
    gpa_0             0.0000                              
    gpa_1             1.0000                              
    gpa_2             2.0000                              
    gpa_3             3.0000                              
    gpa_4             4.0000                              
    gpa_5             5.0000                              

Covariances:
                    Estimate   Std.Err   z-value   P(>|z|)
  intercept ~~                                            
    slope            -0.0014    0.0016   -0.8337    0.4045

Intercepts:
                    Estimate   Std.Err   z-value   P(>|z|)
   .gpa_0             0.0000                              
   .gpa_1             0.0000                              
   .gpa_2             0.0000                              
   .gpa_3             0.0000                              
   .gpa_4             0.0000                              
   .gpa_5             0.0000                              
    intercept         2.5992    0.0183  141.9471    0.0000
    slope             0.1063    0.0059   18.1113    0.0000

Variances:
                    Estimate   Std.Err   z-value   P(>|z|)
   .gpa_0   (resd)    0.0424    0.0021   20.0000    0.0000
   .gpa_1   (resd)    0.0424    0.0021   20.0000    0.0000
   .gpa_2   (resd)    0.0424    0.0021   20.0000    0.0000
   .gpa_3   (resd)    0.0424    0.0021   20.0000    0.0000
   .gpa_4   (resd)    0.0424    0.0021   20.0000    0.0000
   .gpa_5   (resd)    0.0424    0.0021   20.0000    0.0000
    intrcpt           0.0449    0.0068    6.5992    0.0000
    slope             0.0045    0.0007    6.3874    0.0000
mixed_ran_int_ran_slope = lmer(gpa ~ occasion + (1 + occasion|student), data=gpa)
summary(mixed_ran_int_ran_slope)
Linear mixed model fit by REML ['lmerMod']
Formula: gpa ~ occasion + (1 + occasion | student)
   Data: gpa

REML criterion at convergence: 261

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.2695 -0.5377 -0.0128  0.5326  3.1939 

Random effects:
 Groups   Name        Variance Std.Dev. Corr 
 student  (Intercept) 0.045193 0.21259       
          occasion    0.004504 0.06711  -0.10
 Residual             0.042388 0.20588       
Number of obs: 1200, groups:  student, 200

Fixed effects:
            Estimate Std. Error t value
(Intercept) 2.599214   0.018357  141.59
occasion    0.106314   0.005885   18.07

Correlation of Fixed Effects:
         (Intr)
occasion -0.345

In addition, the random coefficients estimates from the mixed model perfectly correlate with those of the latent variables.

Int_mix Slope_mix Int_lgc Slope_lgc
2.40 0.17 2.40 0.17
2.39 0.10 2.39 0.10
2.59 0.15 2.59 0.15
2.51 0.06 2.51 0.06
2.69 0.08 2.69 0.08
2.39 0.06 2.39 0.06

Note that the intercept-slope relationship in the LGC is expressed as a covariance. If we want correlation, we just ask for standardized output.

summary(lgc_ran_int_ran_slope, nd=4, std=T)
lavaan 0.6-3 ended normally after 51 iterations

  Optimization method                           NLMINB
  Number of free parameters                         11
  Number of equality constraints                     5

  Number of observations                           200

  Estimator                                         ML
  Model Fit Test Statistic                     191.409
  Degrees of freedom                                21
  P-value (Chi-square)                           0.000

Parameter Estimates:

  Information                                 Expected
  Information saturated (h1) model          Structured
  Standard Errors                             Standard

Latent Variables:
                    Estimate   Std.Err   z-value   P(>|z|)    Std.lv   Std.all
  intercept =~                                                                
    gpa_0             1.0000                                  0.2118    0.7170
    gpa_1             1.0000                                  0.2118    0.7100
    gpa_2             1.0000                                  0.2118    0.6709
    gpa_3             1.0000                                  0.2118    0.6132
    gpa_4             1.0000                                  0.2118    0.5508
    gpa_5             1.0000                                  0.2118    0.4920
  slope =~                                                                    
    gpa_0             0.0000                                  0.0000    0.0000
    gpa_1             1.0000                                  0.0669    0.2241
    gpa_2             2.0000                                  0.1337    0.4235
    gpa_3             3.0000                                  0.2006    0.5807
    gpa_4             4.0000                                  0.2674    0.6955
    gpa_5             5.0000                                  0.3343    0.7764

Covariances:
                    Estimate   Std.Err   z-value   P(>|z|)    Std.lv   Std.all
  intercept ~~                                                                
    slope            -0.0014    0.0016   -0.8337    0.4045   -0.0963   -0.0963

Intercepts:
                    Estimate   Std.Err   z-value   P(>|z|)    Std.lv   Std.all
   .gpa_0             0.0000                                  0.0000    0.0000
   .gpa_1             0.0000                                  0.0000    0.0000
   .gpa_2             0.0000                                  0.0000    0.0000
   .gpa_3             0.0000                                  0.0000    0.0000
   .gpa_4             0.0000                                  0.0000    0.0000
   .gpa_5             0.0000                                  0.0000    0.0000
    intercept         2.5992    0.0183  141.9471    0.0000   12.2724   12.2724
    slope             0.1063    0.0059   18.1113    0.0000    1.5903    1.5903

Variances:
                    Estimate   Std.Err   z-value   P(>|z|)    Std.lv   Std.all
   .gpa_0   (resd)    0.0424    0.0021   20.0000    0.0000    0.0424    0.4859
   .gpa_1   (resd)    0.0424    0.0021   20.0000    0.0000    0.0424    0.4763
   .gpa_2   (resd)    0.0424    0.0021   20.0000    0.0000    0.0424    0.4253
   .gpa_3   (resd)    0.0424    0.0021   20.0000    0.0000    0.0424    0.3554
   .gpa_4   (resd)    0.0424    0.0021   20.0000    0.0000    0.0424    0.2867
   .gpa_5   (resd)    0.0424    0.0021   20.0000    0.0000    0.0424    0.2287
    intrcpt           0.0449    0.0068    6.5992    0.0000    1.0000    1.0000
    slope             0.0045    0.0007    6.3874    0.0000    1.0000    1.0000

The std.all is what we typically will look at.

Other covariates

Within these models we can have cluster level covariates or those that vary over time.

Cluster level

Mixed model

To add a cluster-level covariate, for a mixed model, it looks something like this (ignoring lowest level subscript, \(b_0\) = intercept):

standard random intercept

\[y = b_{0c} + b_1*\mathrm{time} + e \]

\[b_{0c} = b_0 + u_{intercept}\]

Plugging in becomes:

\[y = b_0 + b_1*\mathrm{occasion} + u_{intercept} + e \]

subject level covariate added

\[b_{0c} = b_0 + c_1*\mathrm{sex} + u_{intercept}\]

But if we plug that into our level 1 model, it just becomes:

\[y = b_0 + c_1*\mathrm{sex} + b_1*\mathrm{occasion} + u_{intercept} + e\]

In our previous modeling syntax it would look like this:

gpa_mixed = lmer(gpa ~ sex + occasion + (occasion|student), data = gpa)

We’d have a fixed effect for sex and interpret it just like in the standard setting.

LGC

With LGC, there is a tendency to interpret the model as an SEM, and certainly one can. But adding additional covariates typically causes confusion for those not familiar with mixed models. We literally do have to regress the intercept and slope latent variables on cluster level covariates as follows.

Furthermore, people automatically put in an effect for the cluster level covariate on the slope as well as the intercept factor. In the mixed model this would result in the following:

subject level covariate added added for slopes

\[b_{1c} = b_1 + d_1*\mathrm{sex} + u_{slope}\]

And after plugging in:

\[y = b_0 + c_1*\mathrm{sex} + b_{1c}*\mathrm{occasion} + d_1*\mathrm{sex}*\mathrm{occasion} + u_{intercept} + u_{slope} + e\]

So we can see that this warrants an interaction between sex and occasion. This is not required, but one should add it if they actually are interested in the interaction.

lgc_cluster_level_model <- '

  intercept =~ 1*gpa_0 + 1*gpa_1 + 1*gpa_2 + 1*gpa_3 + 1*gpa_4 + 1*gpa_5
  slope =~ 0*gpa_0 + 1*gpa_1 + 2*gpa_2 + 3*gpa_3 + 4*gpa_4 + 5*gpa_5
  
  # regressions
  intercept ~ sex
  slope ~ sex 


  gpa_0 ~~ resid*gpa_0    # same residual variance for each time point
  gpa_1 ~~ resid*gpa_1
  gpa_2 ~~ resid*gpa_2
  gpa_3 ~~ resid*gpa_3
  gpa_4 ~~ resid*gpa_4
  gpa_5 ~~ resid*gpa_5  
'

lgc_cluster_level = growth(lgc_cluster_level_model, data = gpa_wide)
summary(lgc_cluster_level)
lavaan 0.6-3 ended normally after 47 iterations

  Optimization method                           NLMINB
  Number of free parameters                         13
  Number of equality constraints                     5

  Number of observations                           200

  Estimator                                         ML
  Model Fit Test Statistic                     193.415
  Degrees of freedom                                25
  P-value (Chi-square)                           0.000

Parameter Estimates:

  Information                                 Expected
  Information saturated (h1) model          Structured
  Standard Errors                             Standard

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)
  intercept =~                                        
    gpa_0             1.000                           
    gpa_1             1.000                           
    gpa_2             1.000                           
    gpa_3             1.000                           
    gpa_4             1.000                           
    gpa_5             1.000                           
  slope =~                                            
    gpa_0             0.000                           
    gpa_1             1.000                           
    gpa_2             2.000                           
    gpa_3             3.000                           
    gpa_4             4.000                           
    gpa_5             5.000                           

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)
  intercept ~                                         
    sex               0.076    0.036    2.083    0.037
  slope ~                                             
    sex               0.029    0.012    2.499    0.012

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)
 .intercept ~~                                        
   .slope            -0.002    0.002   -1.184    0.237

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)
   .gpa_0             0.000                           
   .gpa_1             0.000                           
   .gpa_2             0.000                           
   .gpa_3             0.000                           
   .gpa_4             0.000                           
   .gpa_5             0.000                           
   .intercept         2.484    0.058   42.671    0.000
   .slope             0.062    0.019    3.349    0.001

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .gpa_0   (resd)    0.042    0.002   20.000    0.000
   .gpa_1   (resd)    0.042    0.002   20.000    0.000
   .gpa_2   (resd)    0.042    0.002   20.000    0.000
   .gpa_3   (resd)    0.042    0.002   20.000    0.000
   .gpa_4   (resd)    0.042    0.002   20.000    0.000
   .gpa_5   (resd)    0.042    0.002   20.000    0.000
   .intrcpt           0.043    0.007    6.525    0.000
   .slope             0.004    0.001    6.273    0.000

Applied researchers commonly have difficulty interpreting the model due to past experience with SEM. While these are latent variables, they aren’t just latent variables or underlying constructs. It doesn’t help that the output can be confusing, because now one has an ‘intercept for your intercepts’ and an ‘intercept for your slopes’. In the multilevel context it makes sense, but there you know ‘intercept’ is just ‘fixed effect’.

Furthermore, the interaction is with time as a categorical factor, rather than a numeric/continuous effect. So the corresponding mixed model would look something like this:

gpa = gpa %>% mutate(occasion_f = factor(occasion))
mixed_up_mixed = lmer(gpa ~ sex*occasion_f + (occasion|student), data = gpa)
summary(mixed_up_mixed, cor=F)
Linear mixed model fit by REML ['lmerMod']
Formula: gpa ~ sex * occasion_f + (occasion | student)
   Data: gpa

REML criterion at convergence: 292.4

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.2280 -0.5388 -0.0226  0.5270  3.2147 

Random effects:
 Groups   Name        Variance Std.Dev. Corr 
 student  (Intercept) 0.043940 0.20962       
          occasion    0.004311 0.06566  -0.14
 Residual             0.042686 0.20661       
Number of obs: 1200, groups:  student, 200

Fixed effects:
                      Estimate Std. Error t value
(Intercept)            2.55474    0.03020  84.603
sexfemale              0.07383    0.04168   1.772
occasion_f1            0.11158    0.03073   3.632
occasion_f2            0.18737    0.03287   5.701
occasion_f3            0.26316    0.03615   7.279
occasion_f4            0.36000    0.04031   8.931
occasion_f5            0.47368    0.04509  10.505
sexfemale:occasion_f1  0.01985    0.04240   0.468
sexfemale:occasion_f2  0.05549    0.04536   1.223
sexfemale:occasion_f3  0.11684    0.04990   2.342
sexfemale:occasion_f4  0.12476    0.05563   2.243
sexfemale:occasion_f5  0.12727    0.06223   2.045

Time-varying covariates

Mixed Model

Similarly, if we had a time-varying covariate, say socioeconomic status, it’d look like the following:

gpa_mixed = lmer(gpa ~ occasion + ses + (1 + occasion|student), data = gpa)

Though we could have a random slope for SES if we wanted. You get the picture. Most of the model is still standard regression interpretation.

LGC

With time varying covariates, i.e. those that can have a different value at each time point, the syntax starts to get tedious. Here we add just one such covariate, \(c\).

model.syntax <- '
  # intercept and slope with fixed coefficients
    i =~ 1*y1 + 1*y2 + 1*y3 + 1*y4
    s =~ 0*y1 + 1*y2 + 2*y3 + 3*y4

  # regressions
    i ~ x1 + x2
    s ~ x1 + x2

  # time-varying covariates
    y1 ~ c1
    y2 ~ c2
    y3 ~ c3
    y4 ~ c4
'
fit <- growth(model.syntax, data=Demo.growth)
summary(fit)

Now imagine having just a few of those kinds of variables as would be common in most longitudinal settings. In the mixed model framework one would add them in as any covariate in a regression model, and each covariate would be associated with a single fixed effect. In the LGC framework, one has to regress each time point for the target variable on its corresponding predictor time point. It might take a few paragraphs to explain the coefficients for just a handful of covariates. If you fix them to a single value, you would duplicate the mixed model, but the syntax requires even more tedium.


  1. Usually we would draw both random effects from a multivariate normal distribution with some covariance.